14 research outputs found

    Matching the Efficiency Gains of the Logistic Regression Estimator While Avoiding its Interpretability Problems, in Randomized Trials

    Get PDF
    Adjusting for prognostic baseline variables can lead to improved power in randomized trials. For binary outcomes, a logistic regression estimator is commonly used for such adjustment. This has resulted in substantial efficiency gains in practice, e.g., gains equivalent to reducing the required sample size by 20-28% were observed in a recent survey of traumatic brain injury trials. Robinson and Jewell (1991) proved that the logistic regression estimator is guaranteed to have equal or better asymptotic efficiency compared to the unadjusted estimator (which ignores baseline variables). Unfortunately, the logistic regression estimator has the following dangerous vulnerabilities: it is only interpretable when the treatment effect is identical within every stratum of baseline covariates; also, it is inconsistent under model misspecification, which is virtually guaranteed when the baseline covariates are continuous or categorical with many levels. An open problem was whether there exists an equally powerful, covariate-adjusted estimator with no such vulnerabilities, i.e., one that (i) is interpretable and consistent without requiring any model assumptions, and (ii) matches the efficiency gains of the logistic regression estimator. Such an estimator would provide the best of both worlds: interpretability and consistency under no model assumptions (like the unadjusted estimator) and power gains from covariate adjustment (that match the logistic regression estimator). We prove a new asymptotic result showing that, surprisingly, there are simple estimators satisfying the above properties. We argue that these rarely used estimators have substantial advantages over the more commonly used logistic regression estimator for covariate adjustment in randomized trials with binary outcomes. Though our focus is binary outcomes and logistic regression models, our results extend to a large class of generalized linear models

    Censoring Unbiased Regression Trees and Ensembles

    Get PDF
    This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g., CART, random forests) by making use of a related form of response imputation. Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees

    IMPROVING PRECISION BY ADJUSTING FOR BASELINE VARIABLES IN RANDOMIZED TRIALS WITH BINARY OUTCOMES, WITHOUT REGRESSION MODEL ASSUMPTIONS

    Get PDF
    In randomized clinical trials with baseline variables that are prognostic for the primary outcome, there is potential to improve precision and reduce sample size by appropriately adjusting for these variables. A major challenge is that there are multiple statistical methods to adjust for baseline variables, but little guidance on which is best to use in a given context. The choice of method can have important consequences. For example, one commonly used method leads to uninterpretable estimates if there is any treatment effect heterogeneity, which would jeopardize the validity of trial conclusions. We give practical guidance on how to avoid this problem, while retaining the advantages of covariate adjustment. This can be achieved by using simple (but less well-known) standardization methods from the recent statistics literature. We discuss these methods and give software in R and Stata implementing them. A data example from a recent stroke trial is used to illustrate these methods

    OPTIMIZED ADAPTIVE ENRICHMENT DESIGNS FOR MULTI-ARM TRIALS: LEARNING WHICH SUBPOPULATIONS BENEFIT FROM DIFFERENT TREATMENTS

    Get PDF
    We consider the problem of designing a randomized trial for comparing two treatments versus a common control in two disjoint subpopulations. The subpopulations could be defined in terms of a biomarker or disease severity measured at baseline. The goal is to determine which treatments benefit which subpopulations. We develop a new class of adaptive enrichment designs tailored to solving this problem. Adaptive enrichment designs involve a preplanned rule for modifying enrollment based on accruing data in an ongoing trial. The proposed designs have preplanned rules for stopping accrual of treatment by subpopulation combinations, either for efficacy or futility. The motivation for this adaptive feature is that interim data may indicate that a subpopulation, such as those with lower disease severity at baseline, is unlikely to benefit from a particular treatment while uncertainty remains for the other treatment and/or subpopulation. We optimize these adaptive designs to have the minimum expected sample size under power and Type I error constraints. We compare the performance of the optimized adaptive design versus an optimized non-adaptive (single stage) design. Our approach is demonstrated in simulation studies that mimic features of a completed trial of a medical device for treating heart failure. The optimized adaptive design has 25% smaller expected sample size compared to the optimized non-adaptive design; however, the cost is that the optimized adaptive design has 8% greater maximum sample size. Open-source software that implements the trial design optimization is provided, allowing users to investigate the tradeoffs in using the proposed adaptive versus standard designs

    COMPARISON OF ADAPTIVE RANDOMIZED TRIAL DESIGNS FOR TIME-TO-EVENT OUTCOMES THAT EXPAND VERSUS RESTRICT ENROLLMENT CRITERIA, TO TEST NON-INFERIORITY

    Get PDF
    Adaptive enrichment designs involve preplanned rules for modifying patient enrollment criteria based on data accrued in an ongoing trial. These designs may be useful when it is suspected that a subpopulation, e.g., defined by a biomarker or risk score measured at baseline, may benefit more from treatment than the complementary subpopulation. We compare two types of such designs, for the case of two subpopulations that partition the overall population. The first type starts by enrolling the subpopulation where it is suspected the new treatment is most likely to work, and then may expand inclusion criteria if there is early evidence of a treatment benefit. The second type starts by enrolling from the overall population and then may selectively restrict enrollment if sufficient evidence accrues that the treatment is not benefiting a subpopulation. We construct two-stage designs of each type that guarantee strong control of the familywise Type I error rate, asymptotically. We then compare performance of the designs from each type under different scenarios; the scenarios mimic key features of a completed non-inferiority trial of HIV treatments. Performance criteria include power, sample size, Type I error, estimator bias, and confidence inteval coverage probability

    Doubly robust survival trees.

    No full text

    Estimation in the Semiparametric Accelerated Failure Time Model With Missing Covariates: Improving Efficiency Through Augmentation

    No full text
    <p>This article considers linear regression with missing covariates and a right censored outcome. We first consider a general two-phase outcome sampling design, where full covariate information is only ascertained for subjects in phase two and sampling occurs under an independent Bernoulli sampling scheme with known subject-specific sampling probabilities that depend on phase one information (e.g., survival time, failure status and covariates). The semiparametric information bound is derived for estimating the regression parameter in this setting. We also introduce a more practical class of augmented estimators that is shown to improve asymptotic efficiency over simple but inefficient inverse probability of sampling weighted estimators. Estimation for known sampling weights and extensions to the case of estimated sampling weights are both considered. The allowance for estimated sampling weights permits covariates to be missing at random according to a monotone but unknown mechanism. The asymptotic properties of the augmented estimators are derived and simulation results demonstrate substantial efficiency improvements over simpler inverse probability of sampling weighted estimators in the indicated settings. With suitable modification, the proposed methodology can also be used to improve augmented estimators previously used for missing covariates in a Cox regression model. Supplementary materials for this article are available online.</p

    Censoring Unbiased Regression Trees and Ensembles

    No full text
    <p>This article proposes a novel paradigm for building regression trees and ensemble learning in survival analysis. Generalizations of the classification and regression trees (CART) and random forests (RF) algorithms for general loss functions, and in the latter case more general bootstrap procedures, are both introduced. These results, in combination with an extension of the theory of censoring unbiased transformations (CUTs) applicable to loss functions, underpin the development of two new classes of algorithms for constructing survival trees and survival forests: censoring unbiased regression trees and censoring unbiased regression ensembles. For a certain “doubly robust” CUT of squared error loss, we further show how these new algorithms can be implemented using existing software (e.g., CART, RF). Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.</p
    corecore